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ASYMPTOTICS AND OPTIMAL BANDWIDTH SELECTION FOR 
HIGHEST DENSITY REGION ESTIMATION 1 

By R. J. Samworth and M. P. Wand 

University of Cambridge and University of Wollongong 

We study kernel estimation of highest-density regions (HDR). 
Our main contributions are two-fold. First, we derive a uniform- in- 
bandwidth asymptotic approximation to a risk that is appropriate 
for HDR estimation. This approximation is then used to derive a 
bandwidth selection rule for HDR estimation possessing attractive 
asymptotic properties. We also present the results of numerical stud- 
ies that illustrate the benefits of our theory and methodology. 

1. Introduction. A highest-density region (HDR) for a measurement of 
interest is a region where the underlying density function exceeds some nom- 
inal threshold. Given a random sample from that density, HDR estimation 
typically involves determination of regions where an estimated density is 
high. Kernel density estimation is the most common approach, but its per- 
formance is heavily dependent on the choice of the bandwidth parameter. 
Automatic selection of a good bandwidth for HDR estimation is the overar- 
ching goal of this article. 

Figure 1 illustrates the bandwidth selection issue for HDR estimation. The 
left panel shows five kernel density estimates based on random samples of size 
1000 from the normal mixture §iV(0, 1) + ^(0,^) density [Density 4 of 
Marron and Wand (1992)]. In each case the bandwidth is chosen to minimize 
the integrated squared error (ISE). In the right panel the same random 
samples are used, but, instead, the bandwidths are chosen to minimize an 
error appropriate for estimation of the 20% HDR (defined formally in Section 
2). This region is shown as a thick horizontal line at the base of the plot. It 
is clear from Figure 1 that optimality for HDR estimation is quite different 
from ISE-optimality. Low ISE requires that the two curves be close to each 
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Fig. 1. Left panel: five kernel density estimates based on random samples of size 1000 
simulated from the density depicted by the dashed curve. Each estimate is based on the 
optimal bandwidth with respect to integrated squared error. Right panel: same as the left 
panel except that the bandwidth is chosen to minimize the error for estimation of the 20% 
highest- density region. This region is shown as a thick horizontal line at the base of the 
plot and its boundaries are shown as dashed vertical lines. 

other over the whole real line. However, good estimation of the 20% HDR 
only requires that the 20% HDRs of the kernel density estimates are close 
to the true region. In particular, the sharp mode of the underlying density 
has no bearing upon the HDR and there is no need to estimate it well. 
For this density it is apparent that a bandwidth considerably larger than 
ISE-optimal bandwidth is best for estimation of the 20% HDR. 

In this article we study an asymptotic risk associated with kernel-based 
HDR estimation and use our theory to develop a plug-in type bandwidth 
selector. Attractive asymptotic properties of our bandwidth selector are es- 
tablished and good performance is illustrated on simulated data. A self- 
contained function for use in the R environment [R Development Core Team 
(2008)] is made available on the Internet. 

The HDR estimation problem has an established literature. Contributions 
include Hartigan (1987), Miiller and Sawitzki (1991), Polonik (1995), Hyn- 
dman (1996), Tsybakov (1997), Bafllo, Cuesta-Albertos and Cuevas (2001), 
Bafllo (2003), Cadre (2006), Jang (2006), Rigollet and Vert (2009) and Ma- 
son and Polonik (2009). Mason and Polonik (2009) provide a thorough lit- 
erature review for the problem. Alternative terminology includes estimation 
of the density contours, density level sets and excess mass regions. This lit- 
erature is, however, mainly concerned with theoretical results unconnected 
with the bandwidth selection problem. Jang (2006) is an applied paper on 
the use of HDR estimation for astronomical sky surveys. However, the band- 
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widths used there are chosen via classical ISE-based plug-in strategies. The 
present paper is, to our knowledge, the first to derive theory and bandwidth 
selection rules that are specifically tailored to the HDR estimation problem. 

While our proposed practical bandwidth selector relies on asymptotic 
approximations, its development comes at a time when sample sizes in ap- 
plications that benefit from smoothing techniques are becoming very large. 
The area of application that led to this research, flow cytometry, typically 
has sample sizes in the hundreds of thousands. The astronomical applica- 
tion in Jang (2006) involves sample sizes in the tens of thousands. Another 
HDR application is approximation of the highest posterior density region 
of a parameter in a Bayesian analysis, where only a sample from that den- 
sity is available. In this situation, the sample, most typically obtained using 
Markov chain Monte Carlo methods, can arbitrarily large in size. 

Section 2 presents an approximation to the HDR asymptotic risk. Numer- 
ical studies support its use for bandwidth selection. In Section 3 we describe 
plug-in strategies for bandwidth selection. Asymptotic performance results 
are established and a simulation study demonstrates practical efficacy. We 
conclude with an example on daily temperature maxima in Melbourne, Aus- 
tralia. Proofs are deferred to an Appendix. 

2. Asymptotic risk results. Let / be a probability density function on 
the real line. For r € (0, 1), define 

fr = fAf) =infjy G (0,oo) : J f(x)l {f{x) > y} dx<l-T^. 

We call R T = {x G R : f(x) > f T } the 100(1 — r)% highest- density region of / 
[cf. Hyndman (1996)]. If (X n ) is a sequence of independent random variables 
with density /, the kernel estimator of f{x) based on X±, . . . ,X n is 

where K :R — )■ R satisfies J K(x) dx = 1, and is called a kernel and h > is 
called the bandwidth. Let = f T (fh) denote the plug-in estimator of f T , 
so that 

7/i,r = inf|y G (0,oo): J f h (x)l { ^ x ^ y ydx <1 - r\ . 
The corresponding plug-in estimator of R T is then Rh. T = {i€ M:fh(x) > 

fh,r}- 

Given two Borel subsets A and B of R, we define their proximity through 
a measure on their symmetric difference AAB = (A n B c ) U {A c n B). The 
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particular measure ///we consider is given by 

N {C)= [ f(x)dx 
Jc 

for all Borel subsets C of M. The error fif(Rf l T AR T ) is then then the prob- 
ability of an observation from / lying in precisely one of Rh, T an d R T - Com- 
pared with Lebesgue measure, [if puts more weight on regions where the 
data will tend to be denser. It also has the advantage of admitting a sim- 
ple Monte Carlo approximation. This is important in higher-dimensional 
settings where exact computation of fif(C) is difficult. 

In Theorem 1 , we derive a uniform-in-bandwidth asymptotic expansion for 
the risk E{^/(i?^ T Ai? T )}, which can facilitate a theoretical, optimal choice 
of bandwidth (cf. Corollary 2). This in turn motivates practical bandwidth 
selection algorithms whose performance is studied in Theorems 3 and 4. 
We will make use of the following conditions on the underlying density, 
bandwidth sequence and kernel: 

(Al): / is uniformly continuous on K. There exist finitely many points 
X\ < • ■ • < X2r such that f(xj) = f T for j = 1, . . . , 2r, and moreover 
there exists 5 > such that / is twice continuously differentiable 
in U T j=i[ x 2j-i - 5,x 2 j + 5] with f'(x 2 j-i) > and f'{x2j) < for 
j = l,...,r. 

(A2): Let h~ = h~ and h + = h+ be nonnegative sequences such that h~ < 
h + , such that n(/i~) 4 / \/log(l/h~) — > oo and such that h + — > as 
n — > oo. Then h = h n is a sequence with h~ <h n < for all n. 

(A3): The kernel K is nonnegative, continuously differentiable, of bounded 
variation, and satisfies f xK(x)dx = and /J-2(K) = J x 2 K(x)dx < 
oo. Moreover, K' is of bounded variation, and satisfies J* K'(x) 2 dx < 
oo. 

Assumption (Al) in particular implies that / has a 7-exponent with 7 = 1 
at level f T — in other words, there exists C > such that 

(i f ({x£R:\f(x)-f T \<e})<Ce 

for sufficiently small e > 0. This type of assumption is common in the litera- 
ture for this problem [cf. Polonik (1995), Rigollet and Vert (2009)]. Although 
there are many parts to condition (A3), none is very restrictive. Under (Al), 
f T is the unique positive real number satisfying f f(x)l^f^ > f r y dx = 1 — r. 
In fact, in the course of the proof of Theorem 1 below, we will show that 
under conditions (Al), (A2) and (A3), fh, T has an analogous property: that 

is, with probability one, for all n sufficiently large, fh jT is the unique positive 
real number satisfying 
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Let $ and eft denote the standard normal distribution function and density 
function, respectively, and write R(K) = J K 2 (x)dx. Define the quantities 

x -1 



(2.1) 



2r 



D. 



3j 



R(K)f T j * 1 



2r 1 



and 



j = l,...,2r. 



Theorem 1. Assume (Al), (A2) and (A3). TTiera 



E{ t i f (R h>T AR T )} = J2 

3=1 



(n/i)V2 

+ J B 3j /i 2 {2$( J B 2j n 1 / 2 / l 5/2 ) _ 1 } 



as n — )• oo, uniformly for h £[h , /i + ] , where 

{R(K)f T -2D 3j +D 2 } 1 / 2 



D 



1,3 



2/r- 



2,3 



and 



3,3 



{i?(/Y)/ r -2 J D3, j +vD 2 } 1 / 2 



\f'(*i)\ 



(n/i) 1/2 



The nature of this result is somewhat different from the results in the 
existing literature which have tended to focus (sometimes in more general 
settings) on the order in probability or almost surely of fj J f(Rh )T AR T ) or 
related measures [e.g., Baillo, Cuesta-Albertos and Cuevas (2001), Bai'llo 
(2003)]. More recent works have derived results on the limiting behavior of 
suitably scaled and/or centered versions of iif{Rh,T^Rr) [e.g., Cadre (2006), 
Mason and Polonik (2009)]. Rigollet and Vert (2009) provide a finite sample 
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upper bound for the risk, uniformly over certain Holder classes, with an un- 
specified constant in the bound. While these theoretical results are certainly 
of considerable interest, our aim in providing the asymptotic expansion in 
Theorem 1 is to facilitate practical bandwidth selection algorithms for this 
problem — see Section 3. 

In the course of the proof of Theorem 1, it is shown that 

R(K)f T - 2D 3tj + D 2 = lim (nh) Vax(f h ( Xj ) - f h , T ) > 0, 

n— >oo 

so that each B±j is positive. Moreover B 2 j and B% are nonnegative, and 
are positive for at least one j. Indeed, B 2 j and B^j are certainly positive 

whenever f"(xj) > J2k=i w kf"( x k), where the weights Wk oc l/\f'(xk)\ sum 
to 1. However, this condition on f"(xj) is far from necessary for B 2 j and 
i?3j to be positive. 

It is easily seen from Theorem 1 that for any sequence of bandwidths 
satisfying (A2), if nh 5 is not bounded away from zero and infinity then 
n 2 / 5 E{ / uj(i?/j r x AR T )} —7- 00 along a subsequence. On the other hand, if 
nh 5 is bounded away from zero and infinity, then n 2 / 5 E{/ij(iZ/ ljT Ai2 T )} is 
bounded. Notice that all such sequences are permitted by the condition 
(A2). Focusing our attention on bandwidth sequences of order ra -1 / 5 and 
substituting x = n 

1/2^5/2 

, we have 

2r 

lim n 2/5 E{{i f (R h T AR T )} = V 

i=i 

Writing this limit as g(x) = Y^f=i 9j ( x ) > we see that g is continuous on (0, 00) 
with g(x) — > 00 as x — > + and as x — > 00, so g attains its minimum. If j is 
such that i?2,j and B^ j are positive, then it can be shown (cf. the proof of 
Corollary 2 below), that gj has a unique minimum. This unique minimizer 
represents the asymptotically optimal bandwidth for estimating the risk in 
a small neighborhood of Xj. Although we typically expect the minimum of 
g to be unique, the complicated nature of the function g and the coefficients 
Bij, Bij and B^ j make it difficult to prove this assertion without additional 
conditions. The following corollary gives the desired result in one restricted 
case; however, we anticipate that the result in fact holds much more widely. 

COROLLARY 2. Assume (Al) and (A3). Assume further that in (Al) 
we have r = 1 and the underlying density f is symmetric about some point 
on the real line. Then there exists a unique c op t G (0, 00), depending on f 
and K but not n, such that any sequence of bandwidths (/i pt) that minimizes 
K{/j,f(Rh^ T AR T )} satisfies 

/iopt = c ptn- 1/5 {l + o(l)} 



Bi,j<fi(B 2 ,jx) 

x l/5 



+ B 



3j 



iX 



4/5 



{2*(B 



2,j 



iX 



1} 
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as n — > oo . 

The additional hypotheses on / imply that B\j, B 2) j and B%j do not 
depend on j, and in fact in the presence of (Al) and (A3), the conclusion of 
the corollary also holds under this (weaker) condition, as can be seen from 
the proof. 

2.1. Numerical assessment of risk approximation. Theorem 1 yields the 
asymptotic risk approximation 

E{fi f (R hiT AR T )} 

(2.2) 



(nh) 1 / 2 

+ B 3ij h 2 {2<S>(B 2tJ n 1 / 2 h 5 / 2 )-l} 



In Section 3 we use the right-hand side of (2.2) to develop plug-in band- 
width selection strategies. However, it is prudent to first assess the quality 
of this approximation to the risk. We now do this through some numerical 
examples. 

For a given /, h and r, the risk E{/ij(i2/j T Ai2 T )} is very difficult to obtain 
exactly. Instead, we work with a Monte Carlo approximation, 

1 M 

(2-3) -£>/(/2E T A/2r), 

i=l 

where Rh\, • ■ • , R^t are ^ simulated realisations of Rh, r - F° r large M (2.3) 

serves as reasonable proxy for M{fif(Rf l T AR T )} and is henceforth referred 
to as the "exact" risk. 

Figure 2 compares the asymptotic risk approximation with its "exact" 
counterpart for / corresponding to Densities 2, 4, 6, 8 and 10 of Marron 
and Wand (1992), and (r,n) = (0.5,1000) and (0.8,100,000). The kernel K 
is set to (p throughout, and the Monte Carlo sample size is M = 100. For 
most of these densities the asymptotic risk approximation is quite good for 
n = 1000 in the bandwidth range of interest. Density 4 is the main exception; 
it appears that larger sample sizes are required for the leading terms to be 
dominant. In particular, for this density, the difficulty appears to be caused 
by very large values of \f"\ at the crossing points of /0.5 (for Density 4, the 
level /0.5 is very close to the rapid transition from shallow to steep gradient 
seen in the corresponding upper panel in Figure 2). For several densities, the 
estimand i?o.s corresponds to the fine detail of /. It is perhaps surprising 
that even with the larger sample size, the asymptotic risk approximation is 
not always that accurate, though in some cases the approximation is very 
good indeed. 
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Marron-Wand dens. 2 Marron-Wand dens. 4 Marron-Wand dens. 6 Marron-Wand dens. 8 Marron-Wand dens. 10 




-3-2-1 1 2 3 -3 -2 -1 012 3 -3 -2 -10 1 2 3 -3 -2 -1 1 2 3 -3 -2 -1 1 2 3 




Fig. 2. Comparison of the "exact" risk E,{fj,f(Rt l , T AR T )} and its asymptotic approxi- 
mation for the five of the Marron and Wand (1992) density functions. The panels in the 
first row are the density functions, and panels in the same column correspond to the same 
density function. In each panel in the second and third row, the "exact" risk, obtained by 
averaging 100 realisations of fif(Rh, T AR T ), is shown as a solid black curve. The dashed 
curve is the asymptotic risk approximation corresponding to the right-hand side of (2.2). 
Vertical lines pass through the minima of the "exact" risk (solid line) and the asymptotic 
risk (broken line). The second row has r = 0.5 and n = 1000, while the third row has 
t = 0.8 and n = 100,000. 



3. Bandwidth selection. In this section, we assume that, as in Corollary 
2, there exists a unique c op t G (0, oo) such that any optimal bandwidth se- 
quence satisfies h opt = Coptn" 1 ^^ + o(l)}. In this case, c opt minimizes the 
asymptotic risk given by 



(3.1) AR(c) = -Lf: 



In order to find a practical bandwidth selector, we seek cin estimator c Q pt of 
c op t- The natural way to construct such an estimator is by using estimators 
D\, L>2 and D$ j of D±, D 2 and D^j, respectively, to obtain plug-in estima- 
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tors Bi j, B2J and B%j of B\j, B2J and B^j, respectively. These in turn 
can be used to find c opt = argmin ce ( 0jOO ) AR n (c), where 

1 ^ 

(3.2) AR n ,( C ) = --^ 



n 2 / 5 

.7=1 



C 



, VJ B 2j c & /^) + J B3 J c i {2d>( J B 2j c & /^) - 1} 



With probability one, the solution to this minimization problem will be 
unique for large n provided that AR"(c op t) > and this solution can easily be 
found numerically. Our final bandwidth selector is then /i t hdr, = c op t'n~ 1 ^. 

Note that we have not yet described how to construct the estimators 
D\, D2 and D^j. Again, we propose plug-in estimators based on estimates 
of f T as well as f'(xj) and f"(xj) for j = 1, ...,2r. We assume the ker- 
nel K is smooth, and will construct kernel estimators fh {%j,h )> /fti(%A>) 
and //' 2 (%/i ) of f T , f'(xj) and f"(xj), respectively, where %,/i is an es- 
timator of Xj described below. For the time being, we will use the same 
kernel K in all cases; this requirement will be dropped later on. Even at 
this stage it will, however, be important to note that we can use differ- 
ent bandwidths /io, h\ and hi- Recall [e.g., Wand and Jones (1995), page 
49] that, under appropriate conditions, if x ri _1 /( 2fc + 5 ) as n — )• 00 then 

fjf \xj) — f( k \xj) = O p (n~ 2 /( 2fc+5 )) and that this order cannot be improved 
for a nonnegative kernel. Here we have used the notation a n xb n as n — > 00 
to mean < liminf n _ i . 00 |a n /6 n | < limsup n _ J . 00 |a ri /6 ri | < 00. Finally, we ob- 
serve that if fiQ satisfies (A2), then with probability one, for all sufficiently 
large n there exist x 1)ho < ■ ■ ■ < x 2r ,h such that fh (xj,h ) = fh ,r for each j, 
and we use Xj h to estimate Xj. Our theoretical study of the performance 
of this bandwidth selector requires some additional conditions: 



(A4) 
(A5) 
(A6) 



/ has four continuous derivatives in an open set containing each Xj\ 
ho x n -1 / 5 , h\ x n -1 / 7 and /12 x n -1 / 9 as n — > 00; 
K has a bounded third derivative, K" is of bounded variation and 
/ |x| 3 x \K'(x)\ + x 4 \K"(x)\ dx < 00. 



Theorem 3. A ssume (Al) and (A3) - -(A6). Assume further that c Q pt is 
unique and that AR ;/ (c op t) > 0. Then 



"opt 

as n— >oo. Moreover, recalling that /i t hdr = c op t^ _1//5 , we have 

l + O p (n" 2 / 9 ). 



AR n (c opt ) _ 1 t ^ f ^_ 2 /9\ 



AR(c pt) 
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Examining the proof of Theorem 3 reveals that the rate of convergence 
to zero of the relative error of /i t hdr is determined by the rate at which we 
can estimate f"(xj) for j = 1, . . . , 2r. This suggests that we might be able to 
obtain a faster rate of convergence by using a higher order kernel to estimate 
f"(xj) [and in fact f'(xj)]. Recall that we call K an Sth order kernel if: 

1. fK(x)dx = l; 

2. ll s {K) = f x s K{x)dx = for s = 1, . . . , S - 1; 

3. Hs(K) = Jx s K(x)dx^0 and / \x\ s \K(x)\ dx < oo. 

Higher order kernels refer to S > 2. The usual objection to the use of higher 
order kernels, namely that such a kernel cannot be nonnegative, is less sig- 
nificant when the aim is to estimate derivatives of a density rather than the 
density itself. Let the kernels used to estimate f'{xj) and f"(xj) be denoted 
K\ and K 2 , respectively, and continue to denote the respective bandwidths 
by hi and h 2 . An improved rate of convergence of the relative error of our 
bandwidth selector can be obtained by replacing conditions (A4), (A5) and 
(A6) with the following: 

(A7): / has 12 continuous derivatives in an open set containing each Xj. 

(A8): ho x n -1 / 5 , hi x n" 1 / 15 and h 2 ^ n" 1 / 25 as n — > 00. 

(A9): Ki is a 6th order kernel and has a bounded second derivative with Ki 

and K[ of bounded variation and satisfying f x e \Ki(x) \ + \x\ 7 \K[(x)\ dx < 
00. Moreover, K 2 is a 10th order kernel and has a bounded third 
derivative with K 2 , K 2 and K 2 ' of bounded variation and satisfying 
J x 10 \K 2 {x)\ + \x\ ll \K' 2 {x)\+x l2 \K' 2 \x)\dx<oo. 

We write /i T HDR for the bandwidth selector obtained in a similar way to 
^7-HDRi but using the kernels Ki and K 2 to estimate f'(xj) and f"(xj), 
respectively, in the definitions of D±, D 2 , D^j, Bij, B 2j j and B^j. 

Theorem 4. Assume (Al), (A3) and (A7)-(A9). Assume further that 
c op t is unique and that AR"(c opt ) > 0. Then 



™ =1 + 0p (n- 2 / 5 ) 

tin 



h T 

'opt 

as n— >oo. Moreover, writing /i t hdr = c op t^~ 1//5 , we have 



AR n (c op t) _ -, , ^ 2/S>| 



AR(c pt) 



1 + O pin- 



It is clear that Theorem 3 represents a relatively weak conclusion under 
relatively weak conditions, while Theorem 4 represents a stronger conclusion 
under strong conditions. Intermediate results are also possible but seem to 
be of little practical interest. 
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3.1. An effective practical bandwidth selector. We confine our develop- 
ment of a practical consistent bandwidth selector to the scenario where / 
satisfies weaker smoothness conditions of Theorem 3. Our end-product is a 
fast-to-compute bandwidth selector for HDR estimation that possesses the 
asymptotic properties conveyed by Theorem 3, performs well in simulations 
and is readily implemented in R. Indeed, as detailed below, an R function for 
our procedure is available on the Internet. 

The pilot bandwidths ho, h\ and hi are estimated using direct plug-in 
strategies with two levels of kernel functional estimation. Chapter 3 of Wand 
and Jones (1995) provides details on this general approach to bandwidth 
selection. In the case of ho the approach is similar to those proposed by 
Park and Marron (1990) and Sheather and Jones (1991). Direct plug-in 
bandwidth selection strategies for density functions and their derivatives 
involve estimation of functionals of the form 

/oo 
fV(x)f(x)dx. 
-oo 

Kernel estimators of ip T take the form 

n n 

M9)=n- 2 g- r - 1 Y J Y, Lir) i( X - X ^/9}, 
i=l j=i 

where L is a sufficiently smooth 2nd-order kernel function, and g > is a 
bandwidth parameter. Multi-level plug-in strategies use the fact that the 
asymptotically optimal g, with respect to the mean squared error of Vv(fiO> 
is [— 2L( r \0) / {nijj r+ 2 f u 2 L(u)du}] 1 ^ r+ ' i \ To get the algorithm started we 
also require normal scale estimates of ip r , based on the assumption that / 
is a N(fi,a 2 ) density. Normal scale estimates of ip r take the form 

^ (~ 1 ) r/2 ^ ! 

Vr (2^+1(772)17^/2 ■ 

Throughout we take K = L = (ft, the standard normal kernel. The full algo- 
rithm is: 

1. The inputs are the random sample X\, . . . , X n and parameter < r < 1 
specifying the required HDR. 

2. Let a = min(sample standard deviation, (sample interquartile range)/1.349) 
be a robust estimate of scale. (The interquartile range for the standard 
normal density is approximately 1.349, so this factor ensures approxi- 
mate unbiasedness for normally distributed data.) 

3. Estimate ijjg, ipio and ipn using normal scale estimates. Explicit expres- 
sions for these are = 105/(32vr 1 / 2 a 9 ), ^ S = -945/(647r 1 / 2 a :11 ) an d 

= 10395/(128^/2^13). 
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4. Estimate tpQ, ipg and tpio using kernel estimates ^e(go,i), ^8(51,1) and 
$10(01,1) where g 0>1 = {30/($* s n)} 1/9 , gi l = {-210/(V^ s ? i)} 1 /n an d 
ff i, 2 = {1890/(^ 2 s n)} 1 /i3. 

5. Estimate tp^, ipe and ips using kernel estimates ^(^o^); ^6(51,2) and 
$8(92,2) where 50,2 = [6/{$s (50,1 M] 1/7 , 01,2 = [-30/{$i (91,1 )n}] 1/9 and 

92 , 2 = [210/{$ 12 (( ?1 , 2 )7l}] 1 /ll. 

6. Obtain direct plug-in bandwidths for estimation of by re- 
placing t/v+2 in the optimal expression, with respect to asymptotic 
mean integrated squared error, by ip r+ 2(g r ,2) ■ Explicit expressions are 

h = [l/{27T 1 /2$ 4 ( 502 ) n }]l/5, h X = [-3/{4^ 1 /2^ 6(5l 2 ) n} ]l/7 and l 2 = 

[l5/{8n^Mg 2 , 2 )n}]^. 

7. Obtain pilot of estimates of /, /' and /" via Gaussian kernel estimates 
based on these bandwidths: /^ (-)> (•) and /£ (•). 

8. Use to obtain pilot estimates of f T , r and x\, . . . ,X2 r . 

9. Substitute the estimates from Steps 6 and 7^ into the^expressions for 
Bij, B<2j and B^j to obtain estimates B±j, Bij and B^j. 

10. The selected bandwidth for Gaussian kernel estimation of the 100(1 — 
t)% HDR is /i-rHDR = Coptn -1 / 5 where c opt = argmin cg(0 oo) AR n (c), where 
AR n was defined in (3.2). 

Binned approximations to Vv(<?) [cf. Gonzalez-Manteiga, Sanchez-Sellero 
and Wand (1996)] are strongly recommended to allow fast processing of 
large samples. An R function hdrbwQ that implements the above algorithm 
has been included in the package hdrcde [Hyndman (2009)] which supports 
HDR estimation. 

3.2. Simulation results. We ran a simulation study in which the perfor- 
mance of /i r HDR was compared with an established ISE-based selector: least 
squares cross validation [Rudemo (1982), Bowman (1984)] which we denote 

by ^LSCV- The number of replications in the simulation study was 250. The 
HDR estimation error fj,f(Rh T AR T ) was used throughout the study. Figures 
3 (n = 1000) and 4 (n = 100,000) summarise the results for the situation 
where the true / is the normal mixture density from Section 1 and Figure 1. 
The improvement gained from using the HDR-tailored bandwidth selector 
is apparent from the graphics, especially for the lower values of r. Wilcoxon 
tests applied to the error ratios showed statistically significant improvement 
of hrUDR at the 5% level for r = 0.2,0.5 and 0.8 when n = 100,000. For 
n = 1000, /i-rHDR performed better for r = 0.2,0.5, while /ilscv did better 
for r = 0.8. This latter result is not a big surprise since good estimation of 
-R0.8 requires good estimation of the finger-shaped modal region and this, in 
turn, requires good ISE performance. 
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1=0.2 x=0.5 t=0.8 




-0.5 0.0 0.5 1.0 1.5 -0.5 0.0 0.5 1.0 1.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 



log 10 (ratio of estimation errors (LSCV/HDR)) log 10 (ratio of estimation errors (LSCV/HDR)) log 10 (ratio of estimation errors (LSCV/HDR)) 

Fig. 3. Summary of simulation comparison between /i t hdr and /ilscv for r = 0.2, 0.5 
and 0.8 and 250 samples of size 1000 generated from Density 4 of Marron and Wand 
(1992). The upper panels are scatterplots of the errors /j,f(Rh, T Ai? T ) for h = /ilscv on 
the vertical axes and h = /i t hdr on the horizontal axes. The lower panels are kernel density 
estimates of log 10 ((error for h = /i t hdr)/( error for h — /ilscv))- 

We performed similar simulation comparisons for the remaining Densities 
1-10 of Marron and Wand (1992). For n = 1000 the performance of /i t hdr 
was better than /ilscv for Densities 1-5; whereas /ilscv did better for Den- 
sities 6-10. This suggests that the asymptotics on which h T uDR relies have 
not "kicked in" at n = 1000 for these more intricate density functions. We 
suspect that more sophisticated pilot estimation might improve matters for 
HDR-based bandwidth selection for lower sample sizes. The n = 100,000 
simulations show superior performance of /i t hdr 5 especially r = 0.8 where 
it is the "winner" for 9 out of the 10 density functions. The overarching 
conclusion is that for common density estimation situations /i r HDR is better 
than h LS cv- 

3.3. Application to daily temperature data. We conclude with an appli- 
cation to data on daily maximum temperatures in Melbourne, Australia, for 
the years 1981-1990. These data were used in Hyndman (1996) to illustrate 
HDR principles. We revisit them armed with the automatic HDR estimation 
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t=0.2 t=0.5 t=0.8 




0.000 0.010 0.020 0.000 0.010 0.020 0.000 0.005 0.010 0.015 0.020 

errors for HDR bandwidth errors for HDR bandwidth errors for HDR bandwidth 




-1.0 0.0 0.5 1.0 1.5 -1.0 0.0 0.5 1.0 1.5 -1.5 -0.5 0.5 1.0 1.5 



log 10 (ratio of estimation errors (LSCV/HDR) logi (ratio of estimation errors (LSCV/HDR) log 1a (ratio of estimation errors (LSCV/HDR) 

Fig. 4. Summary of simulation comparison between /i t hdr and /ilscv for r = 0.2, 0.5 
and 0.8 and 250 samples of size 100,000 generated from Density ^ of Marron and Wand 
(1992). The upper panels are scatterplots of the errors Hf{Rh,r Ai? T ) for h = /ilscv on the 
vertical axes and h = /Ithdr on the horizontal axes. The lower panels are kernel density 
estimates of log 10 (( error for h = /i t hdr)/( error for h — /ilscv))- 

technology described in Section 3.1. Of interest are the conditional densi- 
ties of tomorrow's temperature given today's temperature is within a fixed 
interval. 

The intervals for the "today's temperature" values are, in degrees Celsius, 

[5, 10), [10, 15),..., [40, 45). 

Figure 5 shows the kernel estimates of the 20%, 50% and 80% HDRs with 
bandwidths chosen using the rule /i r HDR as detailed in Section 3.1. Some 
interesting bimodality in "tomorrow's temperature" is apparent when con- 
ditioned on today's temperature being in the 30-40 degrees Celsius range. 

APPENDIX: PROOFS 

Proof of Theorem 1. Throughout the proof, it is convenient to write 
xq = — oo and X2 r +\ = oo and adopt the convention that xq + a = — oo and 
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maximum daily temperatures in Melbourne, Australia 




10 20 30 



today's temperature (degrees Celsius) 

Fig. 5. Estimated kernel HDRs for the conditional densities of tomorrow's temperature 
given that today's temperature is in a fixed interval. The bandwidth for each HDR estimate 
is chosen using the selector described in Section 3.1. 



x 2r+i + a = oo for all a£M. Observe that 

/OO 
-OO 

j=0 Jx V 

j = l J X2j-l 



so that 



^ r r x 2j+i ^ ^ 

E{fi f (R h , T AR T )} = ^ / f(x)F(f h (x) > h, T ) dx 

j=0 Jx 23 

r r x 2j ^ ^ 

+ f(x)¥(f h (x)<f h , T )dx. 

~1 Jx2j-1 



The main idea of the proof is that the dominant contribution to E{fif(Rf lT AR T )} 
comes from a union of 2r small intervals, one near each Xj, where P(// 1 (x) > 
fh tT ) is close to 1/2. In each of these intervals, we can represent fh{x) — fh, T 
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by a sample mean of independent and identically distributed random vari- 
ables and a small additional remainder term, and hence apply a normal 
approximation to deduce the result. For clarity of exposition, we now split 
the proof into several steps: 

Step 1. As a preliminary step, let / = / + g be another uniformly con- 
tinuous density, and let f T = f T (f)- Writing || • for the supremum norm 
on the real line, we show that there exists C > 1 such that for all e > suffi- 
ciently small, we have \ f T — f T \ < Ce whenever ||ff||oo = 11/ — /|loo < £• To see 
this, let L = Y7j=i( x 2j-X2j-i) and choose C > l + 2L{|/ T ^Li \ f(x j )\ }~ 1 - 
The inverse function theorem [Burkill and Burkill (2002), Theorem 7.51] 
gives that for e £ R with |e| sufficiently small, we can write 

r 

{x: f(x) >f T + e} = [J [x 2 j-i +5 £: 2j-i,x 2 j - 6 £)2 j] 
i=i 

with 5 £ j = ^ + 0(e 2 ) as e — > 0. It follows that when e > is sufficiently 
small, and ||#||oo < e ; we have 

/oo 
•f( X ) 1 {/0r)>/ T -Ce} dx 
-oo 

{fix) ~ £}%(aO>/r-(C-l)e} 



> 



"2j-l 



= 1 - r+ E/ ^ x ) dx 

3=1 ^ x 2i-l+5_ e (C-l),2j-l 
J_ /'X'2j-<5_ e (C-l),2j 

3=1 ^ x 2i 
r 

-£^2{x 2 j - (5_ e ( C -l),2j - (^23-1 +5_ e( c-l),2j-l)} 

>i- T + i (0 -i )e / T g 17 ^ 5f - 2 .L>i-r. 

Thus f T >f T — Ce. A very similar argument yields the upper bound / r < 
f T + Ce, and this completes Step 1. 

Remark. Now, for 5 > small enough that / has two continuous deriva- 
tives in Is = Uj=i [ x 23-i — S, x 2 j + <5] , let || • || j 4i00 denote the supremum norm 
restricted to Is- It will be helpful in Step 4 to note that a small modification 
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of the above argument may be used to prove that if ||<?||oo and Hs'H/^oo are 
sufficiently small, and if 

r rx 2j +5 / 2r \ 

2 / \g(x)\dx = 0[^\9(x j )\) 

j=l J X2j-l-t \j = l J 

as EjLi Ifi^j)! ->■ 0, then f T -f T = 0(Y%=i b(^j)l) as ^jli IfffeOI °- 
Step 2. We show that for each fixed (5 > 0, 



^ P +1 /(x)P(A(x)> / fc>T )dx 
3=0 ^ 2j +<5 



(A.l) 



+ Z! n 5 f(xMfh(x)<fh,T)dx = o(n- 1 ] 

3=1 Jxsj-i+S 



as n — t- oo. In fact, we claim (and it will be straightforward to see) that 
the error term is of the stated order uniformly for h G [h~,h + ]. Indeed, we 
make a similar claim for every error term in each expression below where 
the bandwidth h appears, but we do not repeat this assertion in future 
occurrences. As in Step 1, observe that under (Al), if 5 > is sufficiently 
small, then there exists e > such that f(x)<f T — s for x G U^ot^j + 
5,x 2 j+i - 5] and f(x) > f T + e for x G U^it^-i + 8,%2j — S\. By reducing 
5 > if necessary, for x G Uj=o[ x 23 + $, %2j+i — $\, 

Hfhix) > fh,r) < nfh(x) - f( X ) - (fh,r - fr) > S) 

(A.2) < F(\\f h - /IU > e/2) + F(\f hsT - f T \ > e/2) 

<2v(\\f h -f\\ 00 >^ 

where we have used the result of Step 1 in the last inequality, and C is the 
constant defined in that step. A very similar argument yields the same upper 
bound for F(ff l (x) < fh. T ) when x G Uj=i[ x 2j-i +S,X2j — $]■ Now, since / is 
uniformly continuous under (Al), 



(A.3) ||E(A)-/|| 00 = sup 



K(z){f(x-hz)-f(x)}dz 



oo 







as n — > oo. The inequality (A.2), together with the observation (A.3) on the 
bias of fh, yields that for n sufficiently large, 

-<5 



£ T 2J+1 f(x)F(f h (x)>f h , T )dx 
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+ E P 5 f(xWh(x)<f h , T )dx 
Jx 2j -i+S 



<2PM|A-E(A)|| 00 >^j 

< exp(— cinhe 2 ) 

for some c\ > 0. Here, the final inequality is an application of Corollary 2.2 
of Gine and Guillou (2002) (a consequence of Talagrand's inequality) to the 
Vapnik-Cervonenkis class of functions {K{(x — •) /h) :x£l,/i>0} [cf. Dud- 
ley (1999), Theorems 4.2.1 and 4.2.4]. Equation (A.l) follows immediately, 
and this completes the proof of Step 2. 

Step 3. We show that (A.l) continues to hold if 5 is replaced by a 
sequence (5 n ) converging to zero, provided that 5 n — > slowly enough that 
n 1 / 4 ^ — > oo and (h + ) 2 = o{5 n ). In order to complete the proof of Step 3, it 
suffices to show that there exists 5 > such that 

'5/i 



E(5,6 n ) = ^2 F(f h (x)>f htT )dx 

J £'2i-l — <5 



__ l Jx 2 j-i- 

fX2j-l+S ^ ^ r rX2j-$n ^ ^ 

+J2 / nh(x)<f htT )dxj2 / F(f h (x)<f htT )dx 

j = l "* x 2j-l+8n j = l J X 2 j—5 



+ J2 r 3+& n7h{x)>h,r)dx = o{n~ 1 ). 
j—l J X2j+8 n 



We may assume 5 > is small enough that / has two continuous derivatives 
in Is- This enables a straightforward modification to the argument in (A. 3) 
using a Taylor expansion, leading to 

(A-4) \\E(f h )-f\\ Isj00 = 0(h 2 ). 

Now there exists a constant C2 > small enough that if we take e n = C2<5 n) 
then we have \f(x) — f T \ > e n when min,- \x — Xj\ > 5 n . Moreover, (h + ) 2 = 
o(e n ), so that for n sufficiently large, the same argument as in Step 2 yields 

E(5,5 n )<2F^\\f h -E(f h )\\ 00 >^ 

< exp(— citifies) = o(n~ 1 ). 



This completes the proof of Step 3. 
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Step 4. We seek asymptotic expansions for E(/^ )T ) and Var(// liT ). To 
this end, for uniformly continuous densities / = / + g that are twice contin- 
uously differentiable in 1$, and for y £ (0,oo), we define 



f(x)l 



{/»>?/} 



dx. 



The reason for making this definition is that by examining the behavior of ift 
under small changes of its arguments from (/, / r ), we will be able to study 
the difference fh T — f T in (A. 8) below. First, for e > sufficiently small, 



2r 



Hf, fr + e)- m fr) + ef r £ jj^~y } 



/OO -y 
^ fWMfr^HxXfr+e} dx + efr JJT^ 



(A.5) 



E 

j = l ^ Jx 2j-1 



%2j-l+8e,2j-l 



f(x) dx 



+ 



2r 



f(x)dx \ +efr^2-rjj 



{ \f'(*j)\ 



0(e 2 



as e \ 0. A very similar argument shows that the error term is of the same 
order as e f~ 0. 

Observe that when ||<?||oo an d Hs'Hi^oc are sufficiently small, / has a 
nonzero derivative in a neighborhood of each Xj. It follows that for suffi- 
ciently small values of ||<?||oo + I Iff' 1 1 7,5,00, we can write 



{x:f(x) > f T } = [j[x 2 j-i + 5 £i2 j-i +r)2j-i,x 2j - 6 £>2 j 
where e = f T — f T . Moreover, provided that 

r rx 2j +8 / 2r \ 

£ / \g(x)\dx = o X>(*i)l 

j=l J X2j-lS \ j = 1 ) 

and EjLi \9(xj)\ = 0(mmj \g(xj)\) as Ylf=i \d{xj)\ + Hs'H/a.oo -> 0, we have 
that rjj = ~ f f^l +0(\g(xj)\\\g'\\i s>00 ) as Ylf=i\9(xj)\ + \\g'\\i 5t00 ^ 0. Thus 
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we can write 



2r 



</>(/, fr) ~ W, fr) ~frJ2 T^l ~ E f" ^ ^ 



< 



2 c 



+ 



/( X )( 1 {/»>/ T } t {f(x)>f. 

9(x)^ { f (x) >f T y-Hf(x)>f T })dx 



])<h ' /t §I/'£)| 



(A.6) 



2c 



2r 



2r 



/T £jl/'(*i) 



{/ 2r \ 2 2r 



as X)i=i Iff^j)! + Ib'lUi.oo — ► 0. Assuming that ip(f,f T ) = 1 — r and that the 
above conditions on g hold, we have from (A. 5) and (A.6) that 



(A.7) 



= ^(/,/r)-^(/,/r) 

= Hf, fr) ~ W, fr) + f(f, fr) ~ fr) 
2r „ 2r 



2c 



2c 



+°|(X>(^)|J +ii9 / ii/„ooEi3(^: 



asE-LibfeOI + llff'Ikoc^o.^ 

We want to apply (A.7) with / = fh, so that g = fh — f. In order to do this, 
we must recall observation (A. 3) on the bias of fh, and the fact that \\fh — 

E(//i)||oo = Oa.s.( "^ ^p^r ) fr° m an application of Corollary 2.2 of Gine and 
Guillou (2002). It follows that \\f h - /(U a 4 0. Similarly, \\E(f h ) - f'\\ h>00 = 
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0(h 2 ), and a further application of Corollary 2.2 of Gine and Guillou (2002) 
gives \\f> h - E(g)\\ Is>00 = O a . s .(^#)- Thus \\f h - f\\ Is>00 a 4' 0. This in 
turn implies that with probability one, for n sufficiently large, fh T is the 
unique solution to tp(fhJh,r) = 1-r, or equivalently / M x )^ { f h{x) >f h t} dx = 
1 — r, as claimed in Section 2. It remains to note that 

e; =1 c_^iaw-/wi^ _ 0p(1) 

J2f=i\fh(xj)- f(Xj)\ 

and 

ET=i\h^)-f( Xj )\ 

— J - — - = O p (l). 

miiij \fh(xj) - f(xj)\ 
It follows that we can now substitute g = fh~ f m (A. 7) to deduce that 



A,r /r (£|/^)|j [E 1//^.)! 

(A.8) +tE fh{x)-f{x)dx 



fr j=1 Jx 2] -i 



" Upl ntf + „i/2 +ft 



Equation (A.8) shows that we can write the difference fh. T - / T as a sample 
mean of independent and identically distributed random variables and a 
small additional r emainde r term. Notice from the bandwidth condition on 

h~ in (A2) that ^ X °^J h) = o(h 2 ). Next, observe that 

where D\ is given in (2.1). Thus, in order to prove that 

(A.9) E(f h>T ) = f T + D 1 h 2 + o(h 2 ), 

it suffices by (A. 7) and Step 1 to show that for any rj > 0, 

n\Th,r -fr- £>i^ii {E ^ if h (x J )-/(x J oi+ii? h -/'ii/ J , < »>^) = o(/i2) - 
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But this follows by Cauchy-Schwarz, because Step 1 may be used to show 
that E(/£ r ) = O(l), and also 

p(E Ifhixj) ~ f(xj)\ > + n\\T h - f'W^oc > V/2) = oin" 1 ). 

We therefore deduce (A. 9). 

In a very similar way, we can also use (A. 7) and the fact that 

^ Var{/ fe (x J )} _Zj 2 [^ 1 I 2 /J_\ 

fri nxj) 2 nh\j^\r{ Xj )\l + Uv' 

where D2 is given in (2.1), to deduce that 

(A.10) Var&r)= ^ +0 (J_ 

Step 5. We can use the results of Step 4 to shrink the region of interest 
still further. From the result of Step 3 we can write 

E{n f (R hjT ARr)} 

fX2j-l+S n 



£ / f(x)\nfh(x)<f h , T )-l {x<X2j _ l} \dx 

j — l JX2j-l—S n 

(A.ll) +J2 f(x)\nf h (x)<f htT )-l {x > X23} \dx + o(n~ 1 ) 

r / .(nh) 1 /25„ 

V / \nfh(x2j-i + (nhyVH) < f h , T ) - l {t<0} \ 

1H J-(nh)V*8 n 



3=1 



fr 



' X2j-lS n 

r rx2j+S„ 
' X2j-8 n 

-(n/i) 1 / 2 ^ 

(nh) 1 / 2 j^J '-(nhy/^sj 
+ \nfh{x 2j + {nh)- l l 2 t) < / h)T ) - t {t > 0} \ dt + oin- 1 ). 

For brevity, we write x*- = Xj + (nh)~ l l 2 t. Now, for each j = 1, . . . , 2r, we see 
that for n sufficiently large, E{// l (x* ) — fh,r} is a strictly monotone function 
of t G [-(nh^Sn, {nh) l l 2 5 n ], with a unique zero t*-, say. Moreover, 

t* = {D l - i / x 2 (K)/"(x J )}{/ / (x,)}- 1 n 1 /2/ 1 5/2 {1 + o(1)} . 

Fix a sequence (t n ) diverging to infinity and let I" = [— (ra/1) 1 / 2 ^, (nh) l / 2 5 n ] \ 
[t* —t n ,tj+ t n ] . We claim that 

[ \Hfh(4 3 -l) < Th,r) - Mt<o } \dt 

3=1 ^ Jl 2j-l 
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(A.12) 



+ / inMxlj) < f h ,r) ~ !{*>(,} I <** ->• 



as n— > oo. Now there exists C3 > such that for all t £ Ujli-^ 1 an d 71 suf- 
ficiently large, we have |E{/^(x*-) — /fe, T }| > c^{nh)~ l / 2 t n . Thus there exists 
C4 > such that for all n sufficiently large, 



Wh^%-l)<h,r)-t{t< Q }\ 



< 



VarV2 {A (4._ l)} 

fh,r — ^(fh,T 



> c 4 t n 



Var 1 /2( //ijT ) 



> C4t n 



uniformly for t £ |Jj=i ^y-i- Since also I p (//i04j) < A,t) ~ l{t>o}l ~> uni- 
formly for t £ Uj=i ■'sy? we deduce (A.12). 

Step 6. We also require an asymptotic expansion for Cov(fh(xj), fh,r)i 
for t £ [i* — t n , i* + i n ]. In fact, provided (t n ) diverges sufficiently slowly, we 
have 



CovCA^),/^) 



3J 



nh 



+ o 



1 

n/i 



uniformly for t £ [£■ — t n ,t* + i n ], where is given at (2.1). This follows 
from the expansion (A. 7) and the fact that provided (t n ) diverges sufficiently 
slowly, 



E 



K 
1 

: T, 
1 

' h 



Xj — X\ 
h 



K 



h 



K(z)K 



(nh)-VH + hz 



f(xj — hz)dz 



frR^+oth- 1 ), 



uniformly for t £ [t* - t n , t* + t n }. 

Step 7. To complete the proof of Theorem 1, it suffices by (A. 11) and 
(A.12) to show that there exists a sequence (t n ) diverging to infinity such 
that 



_fr 

{nh) 1 / 



3=1 V "" , 2 3 -l 
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+ I \nfh(4 j )<fh,r)-t{t>0}\dt 
t 2 j —tn 

(nh) 1 / 2 



j=l L 



+ o[-^ + h 2 



(nh) 1 / 2 

For i = 1, ... ,n, let Z n i(x) = h^ 1 K \ x ~^ ) and let Y n = n _1 X^=i where 



2r 



ZnijXk) ~ f{ x k) 



+ Z ni (x) - f(x)dx 



By (A. 7) and (A. 8), we can write fh{ xt j) — fh,r = Y n + R n , where R n — 
E(R n ) = o p {{nh)~ 1 / 2 }. Since Var(y n ) = 0{{nh)~ 1 } uniformly for t G [t* - 
tn,tj +^n] 5 we choose (t n ) to diverge to infinity so slowly that: 

• p ( 'SrT%)' > i } - h uniforml y for * G \t) - + *n]; 

• (nh)Var(F n ) = i?(i^)/ T - 2D 3>j + L> 2 + o(t~ 1 ), uniformly for t G [t* - 

tn-i tj tn\ ) 

• E(Y n + i? n ) = {(nh)-V 2 tf(xj) + L> 4 /i 2 }{1 + o{t~ 1 )}, uniformly for t G 
[t* - t n ,t* + tn], where L> 4 = \n2{K)f"{ Xj ) - £>i; 

• ^ = 0(^/6). 

Then 



ru h [Xj) < Th,r) <p ( v{jR(k)/t _ 2£>3 . + £,^1/2 ; 

/ |E n -E(i4)| 1 
" V VarV2(y n) > £ 

/ y ra -E(y w ) -Efc + ik) i 

Uar 1 ^^)- Var 1 /^) ^ t 2 



-tf'ix^-Din 1 ' 2 ^/ 2 \ 
{R(K)f T -2D 3J + D 2 y/ 2 J 



t 2 (nhfl 2 ) V Var 1 /2(y n ) 
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o(t; 



-tf'( Xj ) - D^h 5 / 2 
{R(K)f T -2D 3j +D 2 y/ 2 



uniformly for t £ [t* — t n ,t* + t n ]. Here we have used the Berry-Esseen in- 
equality to reach the penultimate line. A very similar argument yields a 
lower bound of the same order. The proof of Step 7, and hence the proof of 
Theorem 1, is now completed by the observation that 



f, 



E 



(n/i)V2 z 



2r 

E 



/ -tf'(x 2j -i)-D 4 n 1 / 2 h 5 / 2 
\{R(K)f T -2D 3j +D 2 y/ 2 

( -tf'{x 2j )-Dm 1,2 h^ 2 
\{R(K)f T -2D 3d +D 2 y/ 2 

B ld <t>{B 2 jn l / 2 h^) 



L{t<0} 



L{t>0} 



(It 



(nh) 1 / 2 



+ B 3d h 2 {2^(B 2J n^ 2 h^ 2 )-l} 



Proof of Corollary 2. We may restrict attention to the case where n/i 5 
is bounded away from zero and infinity. The important point to note is that 
under the hypotheses of the corollary, B±j, B 2 j and B 3 j do not depend on 
j, so we write them as B%, B 2 and B 3 , respectively. 

By making the substitution x = B 2 n l l 2 h?l 2 , there exist positive constants 
Bz/(B\B 2 ) such that lim^oo n 2 / 5 E{fi f (R h)T AR T )} = 



a = 2B X B^ and b 



au(x), where u(x) = x 1//5 0(x) + 6x 4 / 5 {2$(x) - 1}. Since u is continuous 
with u{x) — > oo as x \ and x — > oo, it attains its minimum in (0,oo). To 
show this minimum is unique, it suffices to show that v(x) has a unique zero 
in (0, oo), where 



v(x) 



5x 6/5 
4>(x) 



u\x) 



-1 + 



Abx{2<$>{x) - 1} 

<j>(x) 



+ 5(26- l)x 2 



Now we have 



v'{x) = 2(146 - 5)z + 



v"(x) = 2(186- 5)+862; 2 + 



46(1 + x 2 ){2$(x) - 1} 

4>(x) 

46(3x + x 3 ){2<f>(x) 



1} 



4>{x) 



There are therefore two cases to consider: if 6 > 5/18, then v is strictly 
convex, so since v(0+) = — I and v(x) — > oo as x — > oo, we see that v has a 
unique zero in (0, oo). On the other hand, if 6 < 5/18, then there exists x* £ 
(0,oo) such that v"(x) < for x E (0, x*) and v"(x) > for x G (x*,oo). But 
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if b < 5/18 then v'(x) < 0, for sufficiently small x > 0, so from i>(0+) = —1, 
it again follows that v has a unique zero. 

Write x m i n for the unique minimum of u in (0, oo), and let c opt = (x m i n / B 2 ) 2 ^ 5 ■ 
We conclude that any optimal bandwidth sequence (h op t), in the sense of 
minimizing E{/i/(iJft |T A£,.)}, must satisfy h opt = c ov tn~ l l 5 {l + o(l)} as 
n —> oo. 

Proof of Theorem 3. We require a bound on \xj ^ — | for j = 1, . . . , 2r. 
To this end, let f = f + g be another density satisfying the same condi- 
tions as /. From Step 4 of the proof of Theorem 1, we see that for suffi- 
ciently small values of ||<7||oo + ||</||i 5 ,oo> there exist precisely 2r values x\ < 
■ ■ ■ < x 2r such that f(xj) = f T - Moreover, provided X)j=i Ix2--iS ls r ( x )l ^ x = 
(T,f=i \9(xj)\) as Y%Li \gi x i)\ + h'Wh,™ ->■ 0, we have xj-xj = 0(\g(xj)\) 
as Hf=i \g( x j)\ + llff'lU^oo ->■ 0. Substituting / = f ho , so that g = fh ~f and 

£ j = %fto> we nave l%ho ~ x il = P (^~ 2/5 )- 

It follows that D\ = D\ + O p (n~ 2 / 9 ) , the crucial fact being that (%,h ) — 

= O p (n- 2 / 9 ). Similarly, 5 2 = £> 2 + O p (n- 2 / 7 ) and D 3J = L> 3 j + Op(n" 2 / 7 ) 
for j = 1, . . . , 2r. Thus By = B hj + O p (n- 2 / 7 ), £ 2) j = B 2 j + Op(n" 2 / 9 ) and 
= B3J + O p (n~ 2 / 9 ). We deduce that for any < c\ < c 2 < 00, we have 
AR n (c) = AR(c){l + O p (n~ 2 / 9 )}, uniformly for c E [ci,c 2 ], and a standard 
Taylor expansion argument then gives that c op t = c op t{l + O p (n~ 2 / 9 )}. Both 
conclusions of the theorem follow immediately. 

Proof of Theorem 4. Let z n = S/h>2 , where 5 is small enough that / has 
12 continuous derivatives in UiLit^' — 5, Xj + 6]. Under the conditions of the 
theorem, we may integrate by parts twice and apply a Taylor expansion to 
obtain 



K 2 (z){f"(x j -h 2 z)-f"(x j )}dz 



+ o(hl°) 



O(hl ' 



<"2 )■ 

Thisjsxpression for the bias can be combined with the standard fact that 
Var fH 2 (xj) = 0{{nh 2 )~ 1 } and the bound on {xj^ — Xj\ from the proof of 

Theorem 3 to yield f'h 2 (xj,h ) — f"( x j) = O p (n~ 2 ^ 5 ). Similar computations 

give fL(%j,ho) ~ f'( x j) = O p (n~ 2 / 5 ). The rest of the proof mirrors the proof 
of Theorem 3. 
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